Agregar generación, previsualización y empaquetado XML para markup_doc#66
Open
eduranm wants to merge 20 commits intoscieloorg:mainfrom
Open
Agregar generación, previsualización y empaquetado XML para markup_doc#66eduranm wants to merge 20 commits intoscieloorg:mainfrom
eduranm wants to merge 20 commits intoscieloorg:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Este PR incorpora al flujo de markup_doc la generación de XML JATS, su regeneración al editar, previsualización (HTML y árbol XML) y empaquetado SPS en ZIP dentro del admin de Wagtail/Django.
Changes:
- Agrega un generador de XML JATS y lógica asociada para referencias/figuras/tablas/fórmulas.
- Expone endpoints/admin-urls para descargar XML, previsualizar HTML/árbol XML y descargar el ZIP SPS.
- Implementa utilidades para extraer contenido desde DOCX y para construir el paquete ZIP SPS con
packtools.
Reviewed changes
Copilot reviewed 27 out of 39 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| model_ai/llama.py | Introduce una pausa fija tras llamadas a Gemini. |
| markuplib/function_docx.py | Nuevo extractor de contenido DOCX (texto/tablas/imágenes/fórmulas) a estructuras intermedias. |
| markuplib/init.py | Inicialización de paquete. |
| markup_doc/xml.py | Nuevo generador XML JATS a partir de StreamFields y metadatos del registro. |
| markup_doc/wagtail_hooks.py | Registra admin URLs y hooks; dispara regeneración de XML al editar. |
| markup_doc/views.py | Vistas para download XML, extracción de cita, consulta de journal, preview HTML/árbol XML y ZIP SPS. |
| markup_doc/tests.py | Archivo de tests (placeholder). |
| markup_doc/tasks.py | Tareas Celery para etiquetado/generación y regeneración de XML. |
| markup_doc/sync_api.py | Sincronización de colección/journals desde API SciELO core. |
| markup_doc/static/xsl/xml-tree.xsl | XSL para visualización tipo “árbol” de XML. |
| markup_doc/static/js/xref-button.js | Botones/acciones en admin para insertar <xref>, previews y descarga de ZIP. |
| markup_doc/static/jats/jats-preview.css | CSS de preview JATS. |
| markup_doc/static/css/article.css | CSS adicional/minificado para render HTML. |
| markup_doc/pkg_zip_builder.py | Builder de paquete ZIP SPS usando packtools. |
| markup_doc/models.py | Modelos/StreamFields/panels para markup, campos del journal y XML generado. |
| markup_doc/migrations/init.py | Inicialización de migraciones. |
| markup_doc/migrations/0002_alter_articledocx_estatus_and_more.py | Ajusta choices/default de estatus. |
| markup_doc/marker.py | Funciones auxiliares para marcado vía LLM. |
| markup_doc/labeling_utils.py | Utilidades de etiquetado/procesamiento de citas y fragmentos. |
| markup_doc/issue_proc.py | Implementación “IssueProc” para localizar assets y construir nombres SPS. |
| markup_doc/forms.py | Form base (placeholder). |
| markup_doc/choices.py | Labels/orden para el flujo de marcado. |
| markup_doc/apps.py | AppConfig de markup_doc. |
| markup_doc/api/v1/views.py | Endpoint DRF para “first_block” (marcado de metadatos). |
| markup_doc/api/v1/serializers.py | Serializer DRF para ArticleDocx. |
| markup_doc/api/v1/init.py | Inicialización de módulo API v1. |
| markup_doc/api/init.py | Inicialización de módulo API. |
| markup_doc/admin.py | Admin (placeholder). |
| markup_doc/init.py | Inicialización de paquete. |
| config/settings/base.py | Agrega markup_doc y markuplib a INSTALLED_APPS. |
| config/api_router.py | Registra nueva ruta API first_block. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+581
to
+585
| obj['type'] = 'table' | ||
| obj['table'] = table_data | ||
|
|
||
| if not is_numPr: | ||
| content.append(obj) |
Comment on lines
+77
to
+86
| def get_journal(request): | ||
|
|
||
| if request.method == "POST": | ||
| body = json.loads(request.body) | ||
| text = body.get("text", "") | ||
| pk = body.get("pk", "") | ||
|
|
||
| journal = JournalModel.objects.get(pk=pk) | ||
|
|
||
| return JsonResponse({ |
Comment on lines
+381
to
+383
| if vals[0]: | ||
| node_tmp2 = etree.SubElement(node_tmp, 'title') | ||
| append_fragment(node_tmp2, vals[0].value.get('paragraph')) |
Comment on lines
+705
to
+709
| #if re.search(r'^\[style name="italic"\](.*?)\[/style\]$', val['value']): | ||
| if re.search(r'^<italic>(.*?)</italic>$', val['value']): | ||
| node_title.text = '' | ||
| #ph = val['value'].replace('[style name="italic"]', '').replace('[/style]', '') | ||
| ph = val['value'] |
Comment on lines
+101
to
+103
| response_gemini = model.generate_content(user_input).text | ||
| time.sleep(15) | ||
| return response_gemini |
Comment on lines
+371
to
+375
| if is_numPr: | ||
| numPr = paragraph.find('.//w:numPr', namespaces=paragraph.nsmap) | ||
| numId = numPr.find('.//w:numId', namespaces=paragraph.nsmap).get(namespaces_p + 'val') | ||
| type = [(key, objt) for key, objt in list_types.items() if objt['numId'] == numId] | ||
|
|
Comment on lines
+77
to
+81
| except Exception as e: | ||
| exc_type, exc_value, exc_traceback = sys.exc_info() | ||
| self.components[rendition.original_name] = { | ||
| "failures": format_traceback(exc_traceback), | ||
| } |
Comment on lines
+414
to
+418
| def create(cls, title, doi): | ||
| obj = cls() | ||
| obj.title = title | ||
| obj.doi = doi | ||
| obj.save() |
Comment on lines
+27
to
+29
| from django.utils.html import format_html | ||
| from wagtail.admin import messages | ||
| from wagtail.admin.views import generic |
| try: | ||
| obj = cls.get(title=title) | ||
| except (cls.DoesNotExist, ValueError): | ||
| pass |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
O que esse PR faz?
Agrega la salida XML del flujo de
markup_doc, su regeneración desde edición, la previsualización del resultado y el empaquetado SPS en ZIP.Incluye:
Onde a revisão poderia começar?
Por commits
Como este poderia ser testado manualmente?
Levantar el entorno;
Cargar o editar un documento;
Verificar que se genere o regenere XML;
Probar la descarga del XML;
Abrir la previsualización HTML y el árbol XML;
Generar y descargar el ZIP SPS;
Confirmar que el flujo de edición siga funcionando correctamente.
Algum cenário de contexto que queira dar?
Se enfoca en la salida del flujo de marcado: generación de XML, regeneración desde edición, previsualización y empaquetado final.
Screenshots
N/A
Quais são tickets relevantes?
#65
Referências